source gt 3dc 3dcrdiffnet
Supplementary Materials Shape Registration in the Time of Transformers
In this section, we describe in detail the proposed architecture and its implementation. Our architecture is composed by an encoder and a decoder. The encoder receives as input a predefined number of learnable latent probes LP, together with the point coordinates of the target point cloud XT. Each layer of the encoder performs an operation of cross-attention between LP and XT followed by a self-attention on LP. Each attention is followed by a feed-forward layer.